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^ \ Forty three-option multiple choice (MC) _ statements on 
a midterm examination were converted to 120 true-false (TF) 
statements, identical in content. Test forms (MC and TF) w.ere 
randomly administered to 50 undergraduates, to investigate tne 
validity and internal consistency reliability of the two forms, A 
Kuder-Bichardson. formula 20 reliability was^ computed for each foj^Bu 
R^lfability of the HC form was then adjust,ed with the Spearman-Brown 
fofmura to eqilate testing time, since the MC form took thr ee-iourttxs ^ 
as Btich time to complete as the TF form. Adjusted reliability 
coefficients of the TF and MC .forms ijere .80 and .73, respectively. 

' '^•S^o^^P^^® validity, a Pearson product moment correlation was 
com^tc^d between- test score and grade point average ; validity 
coefMcients were .^19 (TF) and .52 (MC) . R^uljbs support the use of. 
TF teacher made tests as alternatives tp MC tests with no loss in 

.reliability or validity. However, as previous studies have' snown, 
these^iyesults are not obtained when MC items are revised and the 
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The advantages and disadvantages 'of multiple-rchoice and true-false 
formats have been studied by a number of^ iiXestigators (e.g. Nunnally, % 
1964; Karmel, 1970; Blood & Budd, 1972). ,One advantage claimed for ' • 
the true-false (TF) test is that it allows a more efficient sampling 
of course' content. ' Proponents of the multiple-choice (MC) format, 
however, argue £hat thi3 advantage may be. offset by lowered reliability 
coefficients that occur primarily as the result of guessing on, TF 
items. Evidence is ^available^ to suppor,t the contention that reliability 
(Oosterhof & Glasnapp, 1974) if not validity (Frisbie, 1973, 1974) of • 
. the MC test form is higher than that of the TF test form. 

Empirical comparisons of the TF and MC formats, however, have been 
beset by a number of methodological problems. One. serious difficulty 
is that greater care is often given to the preparation * of MC items 
than to TF items. For example, more M? than TF> tests have been item 
analyzed and revised prior to their being compared (Frisbie, 1973, 19,74; 
^ Oosterhof & Glasnapp, 1974) . It would be expected that if MC items 

were more extensively revised than TF items ^ the reliability of the MC 
form would be higher. The present study employed tea'ch'er-made tests in 
which tl\e MC and /TF items were not dif f ereintially improved. 

To further (^omplp.cate matters, TF items are often constructed 
from parallel MC-items on either a one-to-one or .on a twjCMr^-one basis — 
either one or' two questions being generated from each MC item, 
g Resultant test forms then "have contained as many or twice as many TF - 

as MC ivtems. Howevei;-, even 'at a ratio of 2:1 TF:MC questions; an 

' \ } ' • ^ - ■ ' 

estim^e of the hypothetically lengthened TF form reliability was . » 

•* • 

necessary"^o equate testing time (Oosterhof & Glasnapp, 1974)^ \^ 
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Frisbie (1973) -suggested that use* 'of a lofiger TF test would probably 
incre'ase the variance of the TF scores and produce a better estimate' of ' 
the relationship "between MC and TF forms. The present study converted 
each 'Three-option MC statement to three separate parallel TF statements, 
• obyiating the need to, adjust the reliability of the TF form to equate - 
.. .testing tiTne and providing f6r increased variance of the TF scores. 

The expectations for this study ^were: ' ^ : ^ 

1. When MC item& are converted 'to TF items, the internal 
consistency re:l?iabilities of the two forms do not differ 
significantly. * 

2. When testing time is equated, reliabilities of the two fdrms 
do not differ significantly. 

c ' , 

.\3. Validity of the two forms does not differ significantly. 
To reduce likelihood of Type II errors, null hypotheses required 
rejection at. the .10 rather than the .05 level. 

' " ^ " ^ METHOD 

Subjects 

This study was conducted during the summer quarter of 1977. 
Fifty undergraduates enrolled in a required introductory class in ^ 
-^fcfeg^sVand measu^meW^at the University of Washington served as s^jects 
Subjeats were -naive at the time of testing regarding the nature of 
reliability and the relationship between MC anci TF item formats.* \ 

Instrumentation » ' - , f 

A MC and TF form of a midterm e^^amination were constructed for 
the class. Items on the two forms' Siffered In format only, the content 
^ being identical in evdry instance.. Each MC question was converted to 
three .TF questions (two fa^^e TF statements and one true.TF statement). 
To ens^^,.that corresponding MC and TF items were as compaijable as 
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possible in reading time and in other aspects, the stem wa.- included 

in each option of the MC items as it necessarily was in each TF item. 

< 

The conve:jfsion process is illustrated >y the following example^ 

MC item: Circle the letter corresponding to the b^st statement 
for each it;em. ^ ^ , 

^ » " / 

a. The mode may have more than one value in the same 
distribution . * 

b. The range may have more. than one value in the same 
distribution. , v 

c. The standard deviation may have rhor^ than one voOTue 
in the same distribution. , 

TF item: .Circle "T" if the statement is true and "F" if the 

statement is false. . ' "j 

1 . T F The mode may have more than^ one value in the 
.same distribution. 



2. T F 



Thfe range may have more than one 'value in the 
same distribubabn . ^ ^ 
The standard /d^vic 
value in the^ saije 



3. T F The standard /deviation may have more than one(^ 
value in the^ saije distribution. 



Each test was then divided into three equal parts.. Each of the ^ 



three TF items corresponding to a MC iteitf'wa^ placed on -a separate' - ^ 
part. This was done in an attempt to minimize dependency between 

response to a TF item and response to a similar previous i^tem. 

1 f 

Items oil each paAt were then randomly ordered. All students were 

instructed to ^ccimplete part one and hand it in, then to pick up and 
complete the second and then the third part. Items stressed applica- 
tion and interpretation oS concepts rather than memorization of 
facts. ^ The MC midterm consisted of 40 questions: 14, 13 ancJ^ 13 
items on each part; the TF ihidterm had 12 0 questions, 4 0 items on 
each part. Test forms were randomly ordered and distributed to 
studerlts . . r i 
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The class was given 90 minutes to complete the exam. > Since 
all subjects finished within this time speed was not considered to 
t?e a factor influencing performance. Subjects were interrupted 
^ after 12 minutes and asked to circle the number of the item on ' 
which they were working. These data were used to determine the 
number of TF^ items answered per MC item. ; 

The studei^ts' cumulative grade points were used as an External 
criterion ftom which a concurrent va^dity coef f icient^as calculated 

' , ,|feSULTS \^ 

/ 

A Kuder-:^ichardson^ formula 20 reliabila^ty coefficient was 
computed for each of ^ the two test forms. The reliability of the 
MC test was then adjusted with the Spearman-Brown formula to equate 
testing time. Since subjects respoiMed\o 1.19 MC items and 2.8 5' TF 
items pe-jr minute, the TF test took 1.25 times as long to complete 
as the test. The reliability estim'a^e of the MC fqrm was 
adjusted by. this factor/ Means, stan^dard deviations, reliability 
coefficients.^ and the cqrrelation between each test form and GPA 
are presented in Table 1.'^ A statistiiral test of the hypothesis that 
reliabj-lity coefficients associated w^ith two different measi^ement 
procedures are equal has been developed and empirically examined 
b/ Feldt (1969)/ The statistic is based on the assumption that the 

'scores on k parallel parts of a test instrument conform to the 
assumptions of the two^^factor^ random model of analysis of variance:, 

/a normally distributed population randonily sampled and homogeneity 
of variance for the k parts of the .test., The difference between 

^the reli^gliti^B .of the .MC and TF form^ wa;5 tested using- this ' • 



statistic and wa^s not founcj^ to be s\Lgni'f icant (W=1.35 with' 23 and 
24 degre'es of , freedom) . . v. 

To "compare the validities of the two forms, a Pearson prdduct 
^^ment correlation was calculated between test score and -CPA for - 
each "test' form (Tabie 1). The difference was' tes^ted using a Fisher's 
Z transformation, the obtained' value of z=.13 not reaching significance 

^. ' ^ ■: 

/ . : Table^ 1" ^ 

Item _ ' # -Of Unadjust-ed Adjusted 

Format ,x s N ^ items^^.v KR20 Qw) K'R20 rv';,GPA ' (z! 

TF 85. 80 9.80 - 25 




MC 29.12 4^30 *25 . ^aT-:;UV,.68 .73 

; ^ "--'^ ^\ ^ ' ■ 

The ratio ol^f TF to MC ' iti^ms r^sponde^ ,to per unit \ime ,in this ^ 
-Study (2. 4)'" differs froTm thbSe reported by Oosterhof and Glasnapp (1.73) 
and Frisbie (1 . 5) ^, (1974 ) This finding supports Frisbie ' s (1973)' 

" ' ^ ' . ^ • ; , j 0^ • 

Statement that' different student groups and- different Examination 
topics may produce variant response patterns. ^Since three TF- items 
are theoretically equivalent to one MC item, one would pqsit that ' . 
th^ ratio of^TF:MC items answered per unit tii^ should approach the 
ratio^of^ number of MC optians:l if reading time. were constant. Models 
of the optijnal number of choices ^per item have assumed total testing 



time to be propprtional to the number of choices per item but 
empirical studies, including those :^ferenced above have shown that 
TF and Md item types do not satisfy this assumption. 



.The TF:MC ratio found in this' study is higher than that found ' 
in previous^ studies. • This suggests that students spent comparatively 
less time p6r TF item or comparatively more *time per MC item. The 
format of the MC items differs from that standardly jised and may 
have slowed students ' -processing of the MC items. 

Results indicate that reliabilities of the TF forms can be as 
high as reliabilities of threes-option MC forms and can be as- effective 
in measuring classroom achievement. These results contradict other 
findings regarding the reliability of the TF form^ This difference 
^ may be due in part\to the use in this study of a^F test which was 
longer than the *MC test and allowed a better estimate of > reliability' 
for this form. The smaller^ number of items on the MC form was likely 
to be^a possible ^ctor in its lowfer obtained reliability; Scores 
weire not , corrected for guessing .and' the ra^ge of the TF form was not 

r\ ■ ^ 

restricted 'as it had been in previous 'studies .' R^ither,^ the range 
of the MC ,form was comparatively res-tricted. Differing also was the 
method of comparing formats, the present study ^employing all options 
of corresponding MC items as TF items with no known initial biases 
in item disfcrimination . Another factor favoring heightened reliability 
G)f the TF form was ^t he ratio (.67) of itemsrkeyed false to those 
keyed' true. Frisbie (1974) suggested that false items generally 
d^iscriminate better than true items,, 60% false being suggested as 
a possible optimum (Ebel", 1972) . ? ^ 

Resurts of this study provide support for the use of TF teacher- 
made tests as alternatives ^to MC tests with i?o loss in reliability or 
validity. However, as previous studies- have shown, tliese results 
are not obtaine'd when MC items have been subjected to revision and 



the range of the-TF form is restricted. It is suggested that a 
further comparison of formats be made in which both types of items 
have been improved and matched for difficulty and discrimination 
levels, Also, further investigation is suggested varying the numb^ 
of TF items used as MC options and varying the ratio of false to 
true TF statements.- 



I 
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^ The advantages and disadvantages of multiple-rchoice and true-false 

formats have been studied by a number of^ iiXestigators (e.g. Nunnally, • 
1964; Karmel, 1970; Blood & Budd, 1972). ,One advantage claimed for ' • 
the true-false (TF) test is that it allows a more efficient sampling 
of course' content. " Proponents of the multiple-choice (MC) format, 
however, argue £hat thi3 advantage may be. offset by lowered reliability 
coefficients that occur primarily as the result of guessing on^ TF 
items. Evidence is ^available^ to suppor^t the contention that reliability 
(Oosterhof & Glasnapp, 1974) if not validity (Frisbie, 1973, 1974) of • 
the MC test form is higher than that of the TF test form. 

Empirical comparisons of the TF and MC formats, however, have been 
beset by a number of methodological problems. One. serious difficulty 
is that greater care is often given to the preparation * of MC items 
than to TF items. For example, more M? than TF' tests have been item 
analyzed and revised prior to their being compared (Frisbie, 197 3, 19,74; 
^ Oosterhof & Glasnapp, 1974) . It would be expected that if MC items 

were more extensively revised than TF items ^ the reliability of the MC 
form would be higher. The present study employed tea'ch'er-made tests in 
which tl\e MC and /TF items were not dif f ereintially improved. 

To further (^omplp.cate matters, TF items are often constructed 
from parallel MC-items on either a one-to-one or .on a twjcxtp-one basis — 
either one or' two questions being generated from each MC item, 
g Resultant test forms then "have contained as many or twice as many TF » 
as MC Atems. Howeve:^, even 'at a ratio of 2:1 TF:MC questions; an 
estim^e of the hypothetically lengthened TF form reliability was , » 
necessary'^o equate testing time (Oosterhof & Glasnapp, 1974)^ ji 
O ■ - ■ - ■ . ' ■ 
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Frisbie (1973) -suggested that use* 'of a lofiger TF test would probably 
incre'ase the variance of the TF scores and produce a better estimate' of 
the relationship 'between MC and TF forms. The present study converted 
each 4hree~option MC statement to three separate parallel TF statements 
obyiating the need to, adjust the reliability of the TF form to equate - 
.testing time and providing f6r increased variance of the TF scores. 

The expectations for this study (i^were : ' ^ ^ 

1. When MC item& are converted 'to TF items, the internal 
consistency re.l^iabilities of the two forms do not differ 
significantly. * 

2. When testing time is equated, reliabilities of the two fdrms 
do not differ significantly. 

c ' . 

.\3. Validity of the two forms does not differ significantly. 
To reduce likelihood of Type II errors, null hypotheses required 
rejection at. the .10 rather than the .05 level. 

' \ " ^ METHOD 

Subjects 

This study was conducted during the summer quarter of 1977. 
Fifty undergraduates enrolled in a required introductory class in ^ 
-^fcfeg^sXand measu^meW^at the University of Washington served as s^jects 
Subjeats were naive at the time of testing regarding the nature of 
reliability and the relationship between MC anc3 TF item formats.' 

/■ ..." - ■ 

Instrumentation » ' ^ , f 

A MC and TF form of a midterm e^^amination were constructed for 

the class. Items on the two forms' Siffered In format only, the content 

being identical in evdry instance.- Each MC question was converted to 

ft . . ' 

three .TF questions (two fa^^e TF statements and one true.TF statement). 

To ens^^^>that corresponding MC and TF items were as compaijable as ' 

J 



possible in reading time and in other aspects, the stem wa.- included 

in each option of the MC items as it necessarily was in each TF item. 

< 

The conve:jfsion process is illustrated .by the following example^ 



MC Item: 



Circle the letter corresponding to the b^st statement 
for each it^em. ^ ^ , ^« 



a. The mode may have more than one value in the same 
distribution . * 

b. The range may have more. than one value in the same 
distribution. , *^ v 

c. The standard deviation may have rhor^ than one vSaOTue 
in the same distribution. , 

TF item: .Circle "T" if the statement is true and "F" if the 
statement is false. 

1. T - F The mode may have more than^ one value in the 
.same distribution. 



2. T F 

3. T F 



Thfe range may have more than one 'value in the 
same distributabn . ^ ^ 
The standard /deviation may have more than one(^ 
value in the^ saije distribution. 

■ . ■• " . ■ 

Each test was then divided into three equal parts. Each of the 
three TF items corresponding to a MC itertf'wa^s placed on -a separate' ' ^ 
part. This was done in an attempt to minimize dependency between 

response to a TF item and response to a similar previous i^tem. 

1 f 

Items ort" each paAt were then randomly ordered. All students were 

instructed to ^ccimplete part one and hand it in, then to pick up and 
complete the second and then the third part. Items stressed applica- 
tion and interpretation oS concepts rather than memorization of 
facts. ^' The MC midterm consisted of 40 questions: 14, 13 and^ 13 
items on each part; the TF ihidterm had 12 0 questions, 4 0 items on 
each part. Test forms were randomly ordered and distributed to 
studerlts . . - r i 



The class was given 90 minutes to complete the exam. > Since 
all subjects finished within this time speed was not considered to 
l?e a factor influencing performance. Subjects were interrupted 
^ after 12 minutes and asked to circle the number of the item on ' 
which they were working. These data were used to determine the 
number of TF^ items answered per MC item, ; 

The studeKjts' cumulative grade points were used as an External 
criterion ftom which a concurrent validity coef f icient^as calculated 



' , ,|feSULTS ' 

/ 

A Kuder-:^ichardson^ formula 20 reliabila^ty coefficient was 
computed for each of ^ the two test forms. The reliability of the 
MC test was then adjusted with the Spearman-Brown formula to equate 
testing time. Since subjects responded to 1.19 MC items and 2.85 TF 
items pe-jr minute, the TF test took 1.25 times as long to complete 
as the i\C test. The reliability estim'a^e of the MC fqrm was 
adjusted by. this factor/ Means, stan^dard deviations, reliability 
coefficients.^ and the cgrrelation between each test form and GPA 
are presented in Table 1.'^ A statistijral test of the hypothesis that 
reliabj-lity coefficients associated w^ith two different measi^ement 
procedures are equal has been developed and empirically examined 
b/ Feldt (1969)/ The statistic is based on the assumption that the 
'scores on k parallel parts of a test instrument conform to the 

jsumptions of the two^^f actor random model of analysis of variance:. 



fa normally distributed population randonily sampled and homogeneity 
of variance for the k parts of the .test., The difference between 



the reli^gliti^.of the .MC and TF form^ wa;5 tested using- this 



statistic and wa^s not founcj^ to be ssLgnificant (W=1.35 with' 23 and 
24 degrees of , freedom) . . v. 

To compare the validities of the two forms, a Pearson prdGuct 
nt correlation was calculated between test score and GPA for - 
each 'test' form (Tabie 1). The difference was' tes^ted using a Fisher's ' 
Z transformation, the obtained' value of z=.13 not reaching significance . ' 




~~ . r Table^ 1 ^ 

Item _ ' # ,of , ■ Unadjust-ed Adjusted 

Format • ,x s N ^ items^^.v KR20 Qw) K'R20 gw") ry^^PA (z)' 

TF 85.80 9.80-25 • 1,20 ^ 4l^^v> . |0 

MC 29.12 4; 3a. *25- . ^OT ^UV , . 68 



^ , ^ -1% f . 
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The ratio ol^f TF to MC ' itiems r^sponde^ ,to per unit^^me ,in this ^ ' 
-Study (2.4)- differs froTin thbSe reported by Oosterhof and Glasnapp CI. 7 3) 
and Frisbie (1 . 5) ^, (1974 ) This finding supports Frisbie ' s (1973)' ^ 
statement that' different student groups and- different Examination 
topics may produce variant response patterns. ^Since three TF- items 
are theoretically equivalent to one MC item, one would ppsit that ' . 



th^ ratio of^TF:MC items answered per unit time should approach the 
ratio^of^ number of MC optians:l if reading time. were constant. Models 
of the optimal number of choices ^per item have assumed total testing 



time to be propprtional to the number of choices per item but 
empirical studies, t^ncluding those :dfeferenced above have shown that 
TF and MC item types do not satisfy this assumption. 



.The TF:MC ratio found in this' study is higher than that found ' 
in previous^ studies. • This suggests that studeAts spent comparatively 

It 

less time p6r TF item or comparatively more *time per MC item. The 
format of the MC items differs from that standardly jised and may 
have slowed students processing of the MC items. 

Results indicate that reliabilities of the TF forms can be as 
high as reliabilities' of threes-option MC forms and can be as- effective 
in measuring classroom achievement. These results contradict other 
findings regarding the reliability of the TF form^ This difference 
^ may be due in part\to the use in this study of a^F test which was 
longer than the ^MC test and allowed a better estimate of > reliability' 
for this form. The smaller^ number of items on the MC form was likely 
to be^a possible ^ctor in its lowfer obtained reliability; Scores 
weire not, corrected for guessing .and' the ra^ge of the TF form was not 
restricted 'as it had been in previous 'studies . R^ither,^ the range' 
of the MC ,form was comparatively res-tricted. Differing also was the 
method of comparing formats, the present study ^employing all options 
of corresponding MC items as TF items with no known initial biases 
in item disfcrimination . Another factor favoring heightened reliability 
bf the TF form was ^t he ratio (.67): of items^keyed false to those 
keyed' true. Frisbie (1974) suggested that false items generally 
d^iscriminate better than true items,, 60% false being suggested as 
a possible optimum (Ebel', 1972) . ? ' 

Resurts of this study provide support for the use of TF teacher- 
made tests as alternatives ^to MC tests with i^o loss in reliability or 
validity. However, as previous studies- have shown, tliese results 
are not obtaine'd when MC items have been subjected to revision and 



'J 



the range of the.TF form is restricted. It is suggested that a 
further comparison of formats be made in which both types of items ' 
have been improved and matched for difficulty and discrimination 
levels, Also, further investigation is suggested varying the number 
of TF items used as MC options and varying the ratio of false to 
' . true TF statements. 
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